Search CORE

10 research outputs found

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Author: Eck Douglas
Faust Aleksandra
Furuta Hiroki
Gur Izzeddin
Huang Austin
Matsuo Yutaka
Safdari Mustafa
Publication venue
Publication date: 24/07/2023
Field of study

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50%, and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9% higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation

arXiv.org e-Print Archive

Understanding HTML with Large Language Models

Author: Chowdhery Aakanksha
Faust Aleksandra
Fiedel Noah
Gur Izzeddin
Huang Austin
Miao Yingjie
Nachum Ofir
Narang Sharan
Safdari Mustafa
Publication venue
Publication date: 08/10/2022
Field of study

Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding -- i.e., parsing the raw HTML of a webpage, with applications to automation of web-based tasks, crawling, and browser-assisted retrieval -- have not been fully explored. We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages. While previous work has developed dedicated architectures and training procedures for HTML understanding, we show that LLMs pretrained on standard natural language corpora transfer remarkably well to HTML understanding tasks. For instance, fine-tuned LLMs are 12% more accurate at semantic classification compared to models trained exclusively on the task dataset. Moreover, when fine-tuned on data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks using 192x less data compared to the previous best supervised model. Out of the LLMs we evaluate, we show evidence that T5-based models are ideal due to their bidirectional encoder-decoder architecture. To promote further research on LLMs for HTML understanding, we create and open-source a large-scale HTML dataset distilled and auto-labeled from CommonCrawl

arXiv.org e-Print Archive

Small-scale proxies for large-scale Transformer training instabilities

Author: Adlam Ben
Alemi Alex
Co-Reyes John D.
Everett Katie
Gilmer Justin
Gur Izzeddin
Kornblith Simon
Kumar Abhishek
Lee Jaehoon
Liu Peter J.
Novak Roman
Pennington Jeffrey
Sohl-dickstein Jascha
Wortsman Mitchell
Xiao Lechao
Xu Kelvin
Publication venue
Publication date: 25/09/2023
Field of study

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the

\mu

Param (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms

arXiv.org e-Print Archive

Recommended from our members

Learning Natural Language Interfaces using Deep Neural Networks

Author: GUR IZZEDDIN
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Automating user tasks with natural language utterances, such as answering questionsover Wikipedia or booking flight tickets on the Web, is a key component in designingintelligent systems. Natural language is usually preferred as a unified interface for thesesystems and requires no domain expertise for users; however, understanding wide range ofdiverse inputs and resolving errors that occur during this process are still open challengesand the topics of this thesis.Traditional machine learning systems for natural language interfaces usually requirelarge-scale labeled datasets with handcrafted rules to train and evaluate the performancesof the respective models. Firstly, the handcrafted design constrains the scope to a limitedset of domains and prevents the adaptation to new tasks. Additionally, large-scale labeleddata collection is generally domain dependent, costly, and time consuming. These systemsfurther assume that the underlying database, such as Freebase, is accessible and can bequeried indefinitely which is prohibitive when learning from constrained user interfaces,such as Web pages. Last but not least, current systems focus on training offline in aclosed loop where users are excluded from the system inference process. They lack thecapabilities to continuously learn from users.In this thesis, we address the drawbacks of the existing systems and propose dataefficient and user-centric solutions. We classify the natural language inference problembased on two different perspectives: Accessiblity of the system functions – unconstrainedor constrained user interfaces, and nature of user involvement during inference – non-interactive or interactive user interfaces. We first develop neural network based systemsfor non-interactive and unconstrained users interfaces with different data types (i.e. structured and unstructured). The system is trained to learn a continuous representation ofuser utterance, generate and rank candidate answers from underlying database usingthis representation. We augment these systems with an extractive candidate refinementframework by integrating task-oriented human-machine dialogues. Our system is able tounderstand, point, and refine the error in candidates by asking users validation questionsand offering alternatives. We also address the limitations of unconstrained user interfaces and propose reinforcement learning methods to develop policies that are capable oflearning from more constrained web interfaces. The policies are trained on a variety ofweb pages, such as flight booking and social media interaction, with task-based rewardsignals and no human supervision. We test the performance of our models with simulated as well as real users. Empirical results show that the proposed models are able tolearn from limited supervised data and have successful dialogues with users. We observeimprovements in answer prediction accuracy, task success rate, and real user ratings

eScholarship - University of California

Recommended from our members

Learning Natural Language Interfaces using Deep Neural Networks

Author: GUR IZZEDDIN
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

eScholarship - University of California

Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization

Author: Choi Jongwook
Faust Aleksandra
Gur Izzeddin
Lee Honglak
qiang lyubing
Sohn Sungryull
Woo Hyunjae
Publication venue
Publication date: 25/05/2022
Field of study

We tackle real-world problems with complex structures beyond the pixel-based game or simulator. We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph that defines a set of subtasks and their dependencies that are unknown to the agent. Different from the previous meta-rl methods trying to directly infer the unstructured task embedding, our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks, and use it as a prior to improve the task inference in testing. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks than various existing algorithms such as meta reinforcement learning, hierarchical reinforcement learning, and other heuristic agents.Comment: Accepted to UAI 2022 as an oral presentatio

arXiv.org e-Print Archive